Model Selection

Robot Control

# Robot Control

π0+FAST is an efficient action tokenization scheme designed for robotics by Physical Intelligence, suitable for vision-language-action tasks.

Multimodal Fusion

STEVE R1 7B SFT I1 GGUF

This is a weighted/matrix quantized version of the Fanbin/STEVE-R1-7B-SFT model, suitable for resource-constrained environments.

Text-to-Image English

Magma is a foundational multimodal AI agent model capable of processing image and text inputs to generate text outputs, with complex interaction abilities in both virtual and real-world environments.

Pi0 is a general robot control model based on vision-language-action flow, supporting robot control tasks.

Multimodal Fusion

Minivla History2 Vq Libero90 Prismatic

MiniVLA is a compact yet high-performance vision-language-action model, compatible with Prismatic VLMs training scripts, suitable for robotics and multimodal tasks.

Transformers English

VQ-BeT is a behavior generation model trained for the PushT environment, designed based on latent action principles

Image Generation

OpenVLA 7B is an open-source vision-language-action model trained on the Open X-Embodiment dataset, capable of generating robot actions based on language instructions and camera images.

Transformers English

HPT is a transformer model that aligns different entities into a shared latent space, focusing on the study of expansion behaviors in policy learning.

Multimodal Alignment

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase